Adapting K-Medians to Generate Normalized Cluster Centers
نویسندگان
چکیده
Many applications of clustering require the use of normalized data, such as text or mass spectra mining. The spherical K-means algorithm [6], an adaptation of the traditional K-means algorithm, is highly useful for data of this kind because it produces normalized cluster centers. The K-medians clustering algorithm is also an important clustering tool because of its wellknown resistance to outliers. K-medians, however, is not trivially adapted to produce normalized cluster centers. We introduce a new algorithm (called MN), inspired by spherical K-means, that integrates with Kmedians clustering to produce locally optimal normalized cluster centers. We then show theoretically and experimentally that MN produces clusters of significantly higher quality than one would obtain via a simple scaling of the cluster centers produced from traditional K-medians.
منابع مشابه
Generating Normalized Cluster Centers with k-Medians
Many applications of clustering require the use of normalized data, such as text or mass spectra mining. The spherical k-means algorithm [6], an adaptation of the traditional k-means algorithm, is highly useful for data of this kind because it produces normalized cluster centers. The k-medians clustering algorithm is also an important clustering tool because of its wellknown resistance to outli...
متن کاملGreedy bi-criteria approximations for k-medians and k-means
This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost. In the case of k-medians and k-means, the key results are as follows. • When the method considers all data points as candidate centers, then selecting O(k log(1...
متن کاملA New Algorithm for Cluster Initialization
Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximu...
متن کاملNormalized k-means clustering of hyper-rectangles
Interval variables can be measured on very different scales. We first remind a general methodology used for measuring the dispersion of a variable from an optimal center and we define two measures of dispersions associated to two optimal ”centers” for interval variables. Then we study the relations between the standardization of a data table and the use in clustering of a normalized distance. F...
متن کاملFast k-clustering Queries on Road Networks
In this article, we study the k-clustering query problem on road networks, an important problem in Geographic Information Systems. Using Euclidean embeddings and reduction to fast nearest neighbor search, we devise approximation algorithms for these problems. Since these problems are difficult to solve exactly – and even hard to approximate for most variants – we compare our constant factor app...
متن کامل